Core: stop serializing deprecated schema last column id#16927
Open
minleejae wants to merge 1 commit into
Open
Conversation
dbfc716 to
5f1536b
Compare
Author
|
Hi @Fokko, could you take a look when you have a chance? I think this is a follow-up to #11514 and #13850. The original direction was to stop exposing I added a REST server apply-path test to verify that omitting the field preserves the table-level |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
AddSchemaUpdate.last-column-idis deprecated and optional in the REST OpenAPI spec. Java already ignores this field when parsing add-schema updates and computes the table-level last column ID internally.However, Java still serializes the deprecated field using
schema.highestFieldId(). After dropping the column with the highest field ID, this value can decrease even though table metadatalast-column-idmust not decrease. REST catalog implementations that validate the deprecated request field can reject otherwise valid schema evolution updates, as reported in #13850.This does not remove table metadata
last-column-id; it only stops sending the deprecated optional field inAddSchemaUpdate. Field-id conflict validation is still carried byAssertLastAssignedFieldIdrequirements.Compatibility notes
AddSchemaUpdate.last-column-idas deprecated and optional, and says that when it is omitted it will be computed on the server side: https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L3116-L3135TableMetadata.Builder.addSchemapreserves the table-level last column ID withMath.max(lastColumnId, schema.highestFieldId()): https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java#L613-L659 and https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1136-L1138OptionalInt) and its changelog notes the AddSchemalastColumnIddeprecation/optional change. When the value is omitted, Nessie computes from the schema max field ID and only increases the stored table-level value: https://github.com/projectnessie/nessie/blob/8f3a35c103ccf98fad9a95c22f2cf50993026a7f/catalog/format/iceberg/src/main/java/org/projectnessie/catalog/formats/iceberg/rest/IcebergMetadataUpdate.java#L348-L371 and https://github.com/projectnessie/nessie/blob/8f3a35c103ccf98fad9a95c22f2cf50993026a7f/catalog/format/iceberg/src/main/java/org/projectnessie/catalog/formats/iceberg/nessie/NessieModelIceberg.java#L1198-L1250UpdateTableRequest/MetadataUpdatemodel for commits and applies updates viaMetadataUpdate.applyTo(...), so it follows the Iceberg parser/model behavior instead of requiring the deprecated request field. Its copied REST spec also marks the field deprecated and optional: https://github.com/apache/polaris/blob/e3f26cf69f6ee189328f1b7adb90ec83cbbb26d9/api/iceberg-service/src/main/java/org/apache/polaris/service/types/CommitTableRequest.java#L19-L23 and https://github.com/apache/polaris/blob/e3f26cf69f6ee189328f1b7adb90ec83cbbb26d9/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/CatalogHandlerUtils.java#L408-L522 and https://github.com/apache/polaris/blob/e3f26cf69f6ee189328f1b7adb90ec83cbbb26d9/spec/iceberg-rest-catalog-open-api.yaml#L2969-L2988tabulario/iceberg-restreference image is a thin wrapper around Apache Iceberg'sRESTCatalogAdapter, so it does not add a separatelast-column-idvalidator. Its exact behavior is tied to the bundled Iceberg version, which is why the fix belongs in Iceberg's REST metadata update serialization/model: https://github.com/databricks/iceberg-rest-image/blob/62a5078032e0675d34b377d2ead2eea814c1da48/src/main/java/org/apache/iceberg/rest/RESTServerCatalogAdapter.java#L28-L43Catalogs that still send the deprecated field remain supported because parsing is unchanged. The omission path is part of the REST contract; server implementations that need the table-level last assigned field ID should use existing table metadata and/or
AssertLastAssignedFieldIdrequirements rather than depending on the deprecatedAddSchemaUpdatefield.What changed
last-column-idforadd-schemametadata updates.last-column-id.Testing
JAVA_HOME=$(/usr/libexec/java_home -v 21) ./gradlew :iceberg-core:test --tests org.apache.iceberg.TestMetadataUpdateParser --tests org.apache.iceberg.rest.requests.TestUpdateTableRequestParser --tests org.apache.iceberg.rest.TestCatalogHandlers --console=plain --warning-mode=noneJAVA_HOME=$(/usr/libexec/java_home -v 21) ./gradlew :iceberg-core:spotlessJavaCheck --console=plain --warning-mode=none